Class-Based n-gram Models of Natural Language
نویسندگان
چکیده
We address the problem of predicting a word from previous words in a sample of text In particular we discuss n gram models based on classes of words We also discuss several statistical algorithms for assigning words to classes based on the frequency of their co occurrence with other words We nd that we are able to extract classes that have the avor of either syntactically based groupings or semantically based groupings depending on the nature of the underlying statistics
منابع مشابه
CS 224 N : Natural Language Processing
The objective of this project is to analyze the performance of a class-based language model and compare it to the performance of traditional n-gram language models. Class-based language models are well-studied, as is the use of clustering to learn classes of words. However, it seems fairly standard across the literature to use hard-clustering i.e. assign each word to a single class and then to ...
متن کاملVariable-length sequence language model for large vocabulary continuous dictation machine
In natural language, some sequences of words are very frequent. A classical language model, like n-gram, does not adequately take into account such sequences, because it underestimates their probabilities. A better approach consists in modeling word sequences as if they were individual dictionary elements. Sequences are considered as additional entries of the word lexicon, on which language mod...
متن کاملSemantic spaces for improving language modeling
Language models are crucial for many tasks in NLP (Natural Language Processing) and n-grams are the best way to build them. Huge effort is being invested in improving n-gram language models. By introducing external information (morphology, syntax, partitioning into documents, etc.) into the models a significant improvement can be achieved. The models can however be improved with no external inf...
متن کاملExperience with a Stack Decoder-Based HMM CSR and Back-Off N-Gram Language Models
Stochastic language models are more useful than nonstochastic models because they contribute more information than a simple acceptance or rejection of a word sequence. Back-off N-gram language models [ I l l are an effective class of word based stochastic language model. The first part of this paper describes our experiences using the back-off language models in our time-synchronous decoder CSR...
متن کاملAutomatic Induction of N -Gram Language Models from a Natural Language Grammar1
This paper details our work in developing a technique which can automatically generate class n-gram language models from natural language (NL) grammars in dialogue systems. The procedure eliminates the need for double maintenance of the recognizer language model and NL grammar. The resulting language model adopts the standard class n-gram framework for computational efficiency. Moreover, both t...
متن کاملAutomatic induction of n-gram language models from a natural language grammar
This paper details our work in developing a technique which can automatically generate class n-gram language models from natural language (NL) grammars in dialogue systems. The procedure eliminates the need for double maintenance of the recognizer language model and NL grammar. The resulting language model adopts the standard class n-gram framework for computational efficiency. Moreover, both t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational Linguistics
دوره 18 شماره
صفحات -
تاریخ انتشار 1992